Advanced Topics in R

Information Visualization

Rafael S. de Souza

4/17/2017

A statistical computing environment

Some functionalities of R

Bayesian Inference Machine Learning Social Sciences
Computational Physics Medical Image Analysis Spatial Data
Cluster Analysis Multivariate Statistics Statistical Geneticss
Differential Equations Natural Language Survival Analysi
Econometrics Numerical Mathematic Time Series Analysis
Environmetrics Optimization Visualization
Environmetrics Pharmacokinetic Web Technologies
Extreme Value Analysis Phylogenetics
Empirical Finance Probability Distributions
Functional Data Analysis Psychometric

Required packages

require(ggplot2);
require(reshape2);require(d3heatmap);require(circlize);require(ggdendro)

Basic commands

1+1
## [1] 2
x <- 2
for (i in 1:5){
print(x+i)  
}
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7

Basic commands

x <- rnorm(100)
hist(x)

Simple regression model

set.seed(1056)                    # set seed to replicate example
nobs= 150                         # number of obs in model 
x1 <- runif(nobs,0,5)             # random uniform variable
mu <- 1 + 5 * x1 - 0.75 * x1 ^ 2  # linear predictor, xb
y <- rnorm(nobs, mu, sd=0.5)      # create y as adjusted random normal variate 
fit <- lm(y ~ x1+I(x1^2))          # Normal Fit 
summary(fit)
Fitting linear model: y ~ x1 + I(x1^2)
  Estimate Std. Error t value Pr(>|t|)
x1 4.867 0.1057 46.05 3.086e-89
I(x1^2) -0.7204 0.02074 -34.74 9.438e-73
(Intercept) 1.118 0.1111 10.06 1.92e-18

Plot results

xx <- seq(0,5,length=200)
ypred <- predict(fit,newdata=list(x1=xx),type="response")     # Prediction from the model 

plot(x1,y,pch=19,col="red")                                   # Plot regression line 
lines(xx,ypred,col='cyan',lwd=4,lty=2)

segments(x1,fitted(fit),x1,y,lwd=2,col="gray")                # add the residuals

Data Exploration

Read and display data in table format

d <- read.csv("exoplanets.csv",header = T)
d <- d[complete.cases(d),]
head(d)

Scatter plot

Boxplot

Linear Fit

Histograms

Heatmap

Heatmap

Dendrograms

Chord Diagram

nc <- cor(d[,c(2,3,5,6,7)])
chordDiagram(nc)